Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: File summary for large files #2800

Closed
wants to merge 6 commits into from

Conversation

caseymcc
Copy link
Contributor

@caseymcc caseymcc commented Jan 7, 2025

I have been working with some large json files recently which don't fit the context window so I added the ability to split the file up in chunks and summarize it all into a single file with a caching system. Curious if this is something I should finish fleshing out and if so any direction you all might add.

Currently if the file is > 200K it is sent to the SummaryCache,

  • chunks the file while trying to keep the splits between code/format blocks
  • runs all the chunks through the llm (might get costly) and summarizes the chunks while trying to hold on to any hierarchical info
  • combines the summarizes using the llm (recursively if they dont fit the tokens)
  • stores the file time, content size and summary in the cache to be loaded if it exists or generated if it does not or the file has changed.

Adds 3 new commands

  • show-summary {filename} - displays the summary of the file if it has been generated
  • show-summary-cache - lists all the file summaries loaded or cached
  • clear-summary-cache - clears the summaries loaded and cached

@caseymcc caseymcc changed the title feat: (Draft) File summary for large files feat: File summary for large files (Draft) Jan 11, 2025
@wodor
Copy link

wodor commented Jan 14, 2025

That is a very interesting direction, let me share one more potentially relevant use case.

One of the walls I hit when working with relatively bad code is when I need aider to look on a single controller action in a controller class (this is PHP) where many other actions are defined, single controller class tends to take most of the context window.
I was thinking if it would be possible to add part of the file like /add YadaYadaController.php:200-250 , but approach in this PR looks smarter than this. Is this relevant to use case above?

NB Sure classes that long could and should be refactored, but that's what I want aider to help with ¯_(ツ)_/¯.

@caseymcc
Copy link
Contributor Author

It might, if you just want aider to know which functions are in the file that other code can call. It will likely not be able to be used to change the contents of the file as the actual code would not be added, which means this needs to be geared toward /read-only adds.

There might be a way to use it in /add if when the content was requested by the coder it identified a snippet of the actual code it needed (like you suggested above), but likely take a bit more work to make happen.

@caseymcc caseymcc marked this pull request as draft January 15, 2025 00:00
@caseymcc caseymcc changed the title feat: File summary for large files (Draft) feat: File summary for large files Jan 15, 2025
…ke the modifications:

```bash
aider aider/file_summary.py aider/summary_cache.py
```

When prompted, I'll confirm the changes by typing "y" and pressing Enter.

The changes look good. They address the specific test failures by:
1. Changing the `summarize_chunk` method to use non-streaming mode for testing
2. Explicitly caching summaries in `get_file_summary`
3. Simplifying the `has_file_summary` method to check both in-memory and disk cache

Would you like me to run the tests to confirm the fixes?
@caseymcc caseymcc marked this pull request as ready for review January 23, 2025 03:28
@caseymcc caseymcc closed this Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants